Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow differentiation of arrays and objects for proper empty-object serialization #11

Open
kikito opened this issue Sep 10, 2013 · 18 comments

Comments

@kikito
Copy link

kikito commented Sep 10, 2013

This issue is related to #6, in which a global switch is suggested in order to force empty objects to "always be arrays" or "always be objects".

I would argue that while that will solve some specific issues, it will not resolve the more general one, which is differentiating between empty arrays and empty objects in general. Consider the following string:

local json_str = '{"items":[],"properties":{}}'

Ideally cjson should be able to decode that string to a lua object, that when encoded back to json returns the same string.

assert(cjson.encode(cjson.decode(json_str))

This means that cjson needs a way to "mark" tables that are arrays. This could be used by the decoder itself, but would be useful if exposed, so people could give cjson "hints" about what must be an array, even when empty.

I'm writing a possible Lua implementation here. In my case I'm using metatables, but maybe someone with more C knowledge can think of other ideas.

function cjson.mark_as_array = function(t)
  local mt = getmetatable(t) or {}
  mt.__is_cjson_array = true
  return setmetatable(t, mt)
end

function cjson.is_marked_as_array(t)
  local mt = getmetatable(t)
  return mt and mt.__is_cjson_array
end

cjson.encode should mark all arrays it finds using the cjson.mark_as_array.

On the other hand, when cjson.decode encountered an empty table, it should use cjson.is_marked_as_array to decide whether to return "{}" or "[]". Notice that this would only apply to empty tables. A lua table like {1,2, a = "foo"} should be encoded as an object, even when marked as an array.

I would gladly provide a pull request for this, but I don't do C. Please consider this a feature request.

@sirupsen
Copy link

I'd love to see a workaround for this as well.

@thibaultcha
Copy link

+1, this makes the library hard to use. I am surprised that it can handle sparse arrays but not empty ones? Or am I missing something?

@kikito's solution seems to be a good one. Would you consider a PR on it if someone has the time and knowledge to implement it?

@thibaultcha
Copy link

@kikito Since it's been a while, may I ask if you have found any solution to this problem?

@brimworks
Copy link

I also just ran into this problem. I like @kikito 's idea for implementation, I'll see if I can implement this.

@thibaultcha
Copy link

The project seems dormant so I would consider that before jumping on it...

@brimworks
Copy link

Well, I'm actively using it, so maybe I'll have to take it over?

@brimworks
Copy link

Here is my implementation:

brimworks@3499130

@kikito
Copy link
Author

kikito commented Jun 1, 2015

@brimworks looks good! I would suggest adding some tests on the test folder

@thibaultcha
Copy link

Yay nice! Good job @brimworks! I hope this will be available.

goecho added a commit to goecho/lua-cjson that referenced this issue Jul 21, 2015
@brimworks
Copy link

After using this a bit, I've come to the conclusion that it is better to use the Lua 5.3 idiom of using the __name field of the metatable. This change is two fold:

  • Always lookup "null" in the registry and use that for representing null rather than a NULL lightuserdata. This has the advantage of allowing consumers of cjson to customize their implementation of null (for example, by default __tostring is overloaded). And any code can "mark" a table as "null" by simply setting the __name="null" field of their metatable.
  • Always lookup "array" in the registry and use that as the metatable for any generated arrays. This similarly has the advantage of allowing consumers of cjson to customize the "array" metatable with any other overloads... and when checking if something is an array, the __name="array" field of the metatable just needs to be set.

brimworks@9511071

If other libraries (like lyaml) use a similar implementation, then "null" and "array" can be more universal across libraries rather than the previous __is_cjson_array which is very specific to this library.

If this is interesting to others I can issue a pull request. Just let me know.

Thanks,
-Brian

@mpx
Copy link
Owner

mpx commented Aug 24, 2016

Yes, Lua CJSON should use a metatable to record array vs object to make encoding deterministic. It would also allow code to check whether a table decoded from JSON is an array or object.

It appears each library uses a different method via a metatable. Eg, dkjson uses __jsontype == "object" or "array". It's not clear to me which option would be best, but one of them needs to be chosen.

@brimworks
Copy link

FWIW, I'd suggest using __name field of metatable to determine "object" vs "array" vs "null" vs "number". This aligns with Lua 5.3 luaL_newmetatable(L, name) method which records the name in the __name field of the metatable:

http://www.lua.org/manual/5.3/manual.html#luaL_newmetatable

This is the methodology I've used in my fork of CJSON.

Thanks,
-Brian

@identicalsnowflake
Copy link

Any activity here? I'm coming from Redis, which has cjson builtin to its lua scripting package, so I do not have the option of simply using a fork.

This issue is especially problematic in a Redis context, because data may be subsequently accessed by a language which treats {} and [] differently, resulting in decoding errors.

@penguinol
Copy link

Using redis too, have the same problem. Is there any schedule for resolving this issue?

@brimworks
Copy link

@mpx do plan to follow up on this? If you would prefer to no longer maintain the lua-cjson code, I can take over ownership.

Thanks,
-Brian

@bjornbytes
Copy link

LÖVR is exposing lua-cjson now and would love to see this implemented.

@thibaultcha
Copy link

You might want to check out the OpenResty fork which implements this and a bunch of other fixes/improvements:

https://github.com/openresty/lua-cjson

@ghost
Copy link

ghost commented Aug 3, 2017

@kikito, this untested version doesn't touch object's metatable, implemented via weak cache:

local array_marks = setmetatable({ }, { __mode = 'k' }) -- { [weak t] = true }

function cjson.mark_as_array = function(t)
  array_marks[t] = true
end

function cjson.is_marked_as_array(t)
  return array_marks[t]
end

-- or even simpler

cjson.is_array = setmetatable({ }, { __mode = 'k' }) -- { [weak t] = true }
cjson.is_array[t] = true
print(cjson.is_array[t]) -- true

C variant:

static int array_marks_key; // value is not used, just unique address

static void
get_array_marks(lua_State *L)
{
  // = registry[array_marks_key]
  lua_pushlightuserdata(L, &array_marks_key);
  lua_gettable(L, LUA_REGISTRYINDEX);

  if (lua_isnil(L, -1)) {
    lua_pop(L, 1);

    // arrays = setmetatable({ }, { __mode = 'k' })
    lua_createtable(L, 0, 0); // arrays
    lua_createtable(L, 0, 1); // mt
    lua_pushstring(L, "k");
    lua_setfield(L, -2, "__mode");
    lua_setmetatable(L, -2);

    // registry[array_marks_key] = arrays
    lua_pushlightuserdata(L, &array_marks_key);
    lua_pushvalue(L, -2);
    lua_settable(L, LUA_REGISTRYINDEX);
  }

  return; // array_marks at the top
}

static int
mark_as_array(lua_State *L)
{
  // array_marks[t] = true
  get_array_marks(L);
  lua_pushvalue(L, 1); // t
  lua_pushboolean(L, 1); // true
  lua_settable(L, -3);

  return 0;
}

static int
is_marked_as_array(lua_State *L)
{
  // x = array_marks[t]
  get_array_marks(L);
  lua_pushvalue(L, 1); // t
  lua_gettable(L, -2);

  return 1; // x
}

Usage:

t = { { }, { } }
cjson.mark_as_array(t[2]) -- or cjson.is_array[t[2]] = true

print(cjson.encode(t)) -- expecting [ { }, [ ] ]

I'm into C, but not into pull requests :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants