Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for a new API to get type and size info without unpack #119

Open
tagomoris opened this issue Jun 17, 2016 · 7 comments
Open

Proposal for a new API to get type and size info without unpack #119

tagomoris opened this issue Jun 17, 2016 · 7 comments

Comments

@tagomoris
Copy link
Member

@tagomoris tagomoris commented Jun 17, 2016

MessagePack (serialized) binary has type info and size of (str/bin, array, map) at the header of whole binary.
When we want to know these, it's only way to unpack whole data. But it requires much computing power.

I propose a new API call to return an array of Hash object which contains:

  • :type => (:nil, :string, :int, :float, :array, :map, :ext, ...)
  • :size => (bits of numeric values, bytes of bin/str, length of elements of map/array, length of bytes of ext, ...)
  • :id => ext type id (only for :ext)

A hash represents info of a msgpack object. The method scans all binary to build an array of it.

How do you think about this idea? @frsyuki

@nurse
Copy link
Contributor

@nurse nurse commented Jun 17, 2016

@tagomoris
Copy link
Member Author

@tagomoris tagomoris commented Jun 17, 2016

I found that what I really want is a method to know how many msgpack objects exist in a binary. Hmm.

@tagomoris
Copy link
Member Author

@tagomoris tagomoris commented Jun 17, 2016

@nurse I updated the description above. Is it still similar to yours?

@nurse
Copy link
Contributor

@nurse nurse commented Jun 17, 2016

Your proposal needs a cursor (or an API to really skip objects without allocating objects). Mime contains such implementation as a cursor, but yours seems to want to move the unpacker's position.

@frsyuki
Copy link
Member

@frsyuki frsyuki commented Jun 17, 2016

Doesn't Unpacker#read_array_header or Unpacker#read_map_header work?
But anyways it sounds good idea to add methods in addition to above methods to tell:

  • type of next value
  • length of next str or bin
  • type and length of next ext
@frsyuki
Copy link
Member

@frsyuki frsyuki commented Jun 17, 2016

Returning a class instance (immutable struct) is better than Hash in terms of performance.
I think it should be 2 methods because checking types needs only 1 byte but checking length need more bytes:

class Unpacker
  #
  # Returns type of next value.
  # This peeks 1 byte from the underlaying internal buffer.
  #
  # :nil, :string. :integer, :float, :array, :map, ...
  #
  # @return Symbol
  def peek_next_type
  end

  #
  # Returns ValueInfo of next value.
  # This peeks 1 - 5 bytes from the underlaying internal buffer.
  #
  # @return ValueInfo or its subclass
  def peek_next_value_info
  end
end

class ValueInfo
  # Returns type of the value (:string, :binary, :integer, ...)
  def type
  end

  # Returns of size of string, binary, or extention value
  def size
  end

  alias_method :length, :size

  # Returns type id of extension value
  def ext_type_id
  end

  # Returns true if type is :string or :binary
  def raw?
  end

  # Returns true if type is :integer or :float
  def number?
  end
end

These are optional (My bet is that NOT adding them until some actually need them):

class Unpacker
  #
  # Returns format of next value
  #
  # :nil, :str8, :str16, :str32, :positive_fixint, :negative_fixint, ...
  # 
  def peek_next_format
  end
end

This is how msgpack-java designs data model around types:

@frsyuki
Copy link
Member

@frsyuki frsyuki commented Jun 17, 2016

I think it's NOT good idea to add read_next_type. Because moving cursor makes internal state of the Unpacker unusable. There're no methods to read payload only.

Adding such method as like msgpack-java does is another idea but it will need read_next_format instead of read_next_type to return MessageFormat class. Because read_next_type doesn't tell length of the payload users need to read next. It's another topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.